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E VALUATION 


Die  objective  of  this  program  is  to  develop,  test,  and  evaluate  a 
speaker  independent,  on-line  continuous/isolated  speech  recognition 
method  for  gisting  audio/speech  material.  Die  system  has  the  capability 
of  recognizing  a  vocabulary  consisting  of  isolated  words  as  well  as 
connected  phrases  spoken  in  English  in  an  unconstrained  manner, 
independent  of  speaker  and  of  the  channel. 

Two  sets  of  recognition  tests  were  performed.  Die  first  test  processed 
46  different  subjects  (males  and  females)  uttering  a  set  of  48  connected 
triple  digit  sequences.  Die  testing  resulted  in  97.5  percent  correct 
digit  recognition  for  the  triple  digits  spoken  in  a  normal  manner. 

Die  second  test  involved  25  different  speakers  testing  a  40  word 
air  traffic  control  vocabulary.  A  99  percent  correct  word  recognition 
score  was  achieved  for  this  vocabulary. 

Diis  technology  shall  be  used  as  an  aid  to  analysts  in  various  Air  Force 
Command  and  Control  functions. 

RICHARD  S.  VONUSA 
Project  Engineer 
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INTRODUCTION 


The  exploratory  development  model  was  implemented  on  a  PDP- 
11/70  computer  and  incorporates  basic  algorithms  developed  on  pre¬ 
vious  programs.  A  g is ting  scenario  was  developed  for  the  collection 
of  air  traffic  control  data  by  means  of  voice  input.  Hie  recogni¬ 
tion  vocabulary  was  increased  to  50  words  by  the  addition  of  control 
and  descriptor  words  used  to  control  the  scenario  and  enter  simu¬ 
lated  air  traffic  control  data. 

On-line  operation  has  been  achieved  in  a  smoothly  operating 
entry  procedure,  but  real-time  has  not  been  fully  achieved  due  to 
the  difficulty  and  time  requirement  of  converting  present  Fortran 
programs  into  faster  operating  machine  language  equivalents. 

Speaker  and  channel  independence  have  been  inproved  due  to 
the  speaker  trainability  of  the  algorithm.  This  algorithm  was 
deemed  to  be  appropriate  for  the  air  traffic  control  gisting  scena¬ 
rio  since  operators  would  have  an  opportunity  to  train  the  system 
to  their  voices  prior  to  actually  utilizing  the  gisting  mode.  They 
could  retain  their  template  files  for  future  operating  sessions, 
and  could  update  or  remake  their  files  at  any  time.  A  new  training 
session  would  be  required  if  a  new  operator  or  a  different  channel 
frequency  characteristic  were  to  be  acccomodated. 

Substantial  improvements  have  been  made  in  recognition  accu¬ 
racy  by  broadening  the  highest  frequency  channel  of  the  analyzer 
filter  bank,  and  by  eliminating  from  the  recognition  process  those 


quiet  soinds  which  are  carp  arable  in  spectral  magnitude  to  the  back¬ 
ground  noise.  The  broader  filter  now  responds  more  reliably  to  /s/, 
/z/,  etc.  improving  overall  recognition  accuracy.  It  was  found  also 
that  previously  obtained  examples  contained  irrelevant  and  inconsis¬ 
tent  spectral  transitions  as  the  quiet  sounds  emerged  above  or 
dropped  below  the  noise  level.  A  considerable  improvement  in  accu¬ 
racy  was  realized  by  raising  the  noise  threshold  and  by  taking  mea¬ 
sures  to  minimize  the  levels  of  noise  sources,  particularly  those 
containing  strong  spectral  peaks. 

A  gisting  scenario  has  been  developed  and  successfully  demon¬ 
strated.  It  is  an  on-line  algorithm  for  performing  a  simulated 
gisting  task.  In  the  simulated  task,  gisting  files  are  created, 
edited,  appended  and  stored  in  computer  memory.  Except  for  startup 
and  file  access,  operation  is  entirely  by  voice.  The  task  that  is 
simulated  is  that  of  gisting  in  an  air  traffic  control  environment. 
An  operator  listens  to  an  air  traffic  control  channel  by  means  of  a 
headset,  and  makes  entries  by  voice  of  certain  types  of  information. 
In  the  simulated  task,  gisting  entries  conprise  a  descriptor  word 
spoken  in  unconnected  form  followed  by  a  connected  digit  group  which 
may  be  of  any  length  that  can  be  spoken  within  a  2.5  second  time 
window. 


Recognition  experiments  were  performed  utilizing  recordings 
of  persons  speaking  randomly  selected  sets  of  descriptors  followed 
by  digit  groups .  These  were  first  recorded  on  audio  tape,  then 
transcribed,  one  word  or  digit  group  at  a  time  into  computer  disk 
memory  for  later  automatic  collection  of  results. 

Tenplate  files  were  constructed  from  a  set  of  training  utter¬ 
ances.  These  utterances  were  not  used  in  the  tests.  Tests  were 
run  for  each  vocabulary  using  the  template  files  that  would  be  acces¬ 
sed  automatically  by  means  of  the  scenario.  Recognition  results  were 
tabulated,  and  overall  results  compiled  for  each  vocabulary  over  46 
people.  Overall  performance  was  97.6%  on  digits  in  connected  groups, 
and  99%  averaged  over  all  conmand  and  descriptor  words. 

1.0  Exploratory  Development  Model. 

An  exploratory  development  model  (EEM)  was  constructed  consis¬ 
ting  of  hardware  and  software,  primarily  utilizing  the  PDP-11/ 70 

conputer  system.  The  EEM  incorporates  basic  algorithms  developed 

1  2 

under  previous  contracts,  and  described  ir.  previous  reports.  * 

The  hardware  configuration  is  as  shown  in  Figure  1,  consisting 
of  the  PDP- 11/70  CRJ  with  an  RP04  40  MBYTE  disk  and  a  floating  point 
processor.  A  CRT  terminal  is  used  for  instructing  the  operator 
through  training  and  recognition  modes  and  an  RF04  disk  system  is 
used  for  storage  of  files  including  programs,  template  files,  and 
spectral  representations  of  input  utterances.  Speech  input  is  via 
audio  recorder  or  directly  from  a  microphone  or  telephone  line. 

Speech  is  first  preenphasized,  then  passed  to  inputs  of  the 


parameter  extractors  which  include  a  16  channel  filter  bank  plus 
voice  and  pitch  extractors.  These  are  digitized  and  converted  to 
conputer  format  in  the  Data  Acquisition  Systran,  which  is  connected 
to  the  PDP-11/70  unibus. 

The  HEM,  as  described  above  was  implemented  on  a  DEC  PDP-11/70 
and  written  in  FORTRAN  IV+.  It  is  an  integral  part  of  an  overall 
system  which  includes  a  TU-15,  9  track  tape  drive,  additional  CRT 
and  hard  copy  terminals  and  a  high-speed  printer.  There  is  inter¬ 
faced  to  the  PDP-11/ 70,  a  PDP-11/10  dedicated  to  the  control  of 
input  and  output  of  a  GT-42  interactive  graphic  terminal. 

1.1  Recognition  Algorithm. 

A  speaker  trainable  algorithm  was  used  since  high  accuracy 
was  desired  in  the  gisting  environment.  The  basic  speaker  train- 
able  algorithm  developed  an  an  earlier  contract  is  described  in 

1  ^  7  J 

previous  reports  ’  ’  *  .  Changes  and  improvements  were  made  result¬ 
ing  in  faster  and  more  reliable  operation.  These  changes  were 
directed  toward  the  achievement  of  real-time,  on-line  operation, 
speaker  and  channel  independence,  and  higher  accuracy  of  recogni¬ 
tion.  Although  real-time  operation  was  not  realized,  the  major 
portion  of  the  above  objective  was  achieved. 

1.1.1.  Real-Time,  On-Line  Processing. 

During  the  contract,  extensive  effort  was  devoted  to  the 
realization  of  on-line  processing,  but  real-time  performance  was 
not  realized.  Real-time  operation  necessitates  the  reprogramming 
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of  all  algorithns  into  time  efficient  machine  language  programs,  and 
also  the  extensive  utilization  of  time  sharing  between  the  data  in- 

t 

put  and  data  processing  segments.  While  simple  in  concept,  reduc¬ 
tion  to  real-time  was  not  realized  during  the  contract  due  to  the 
extent  and  complexity  of  needed  programing. 

1.1.2.  Speaker  and  Channel  Independence. 

The  nature  of  the  air  traffic  control  gisting  task  suggested 
the  use  of  a  speaker  trainable  algorithm,  since  only  a  single  train¬ 
ing  session  would  usually  be  needed  by  a  new  operator  of  the  gisting 
program.  The  speaker  trainable  algorithm  has  the  advantage  of  high 
accuracy  on  people  who  train  it,  and  also,  it  obviates  the  need  for 
speaker  and  channel  transformations,  since  these  are  inherent  in  the 
training  process.  It  was  found,  however,  that  channel  and  background 
noise  could,  if  large  in  amplitude  and  concentrated  into  narrow  spec¬ 
tral  bands,  reduce  the  accuracy  of  recognition.  Further  discussion 
of  this  is  included  in  the  following  subsection. 

1.1.3.  Improvements  in  Recognition  Accuracy. 

It  was  found  after  carefully  examining  the  results  on  the  pre¬ 
vious  contract  that  there  was  an  excessive  number  of  errors  involving 
"six"  and  "seven”,  particularly  when  spoken  by  women.  This  was 
found  to  be  due  to  a  concentration  of  /s/  energy  above  SKHz  and 
practically  no  energy  within  the  range  of  the  16  analyzer  filters.  t 

The  /s/  was,  in  effect,  not  "heard"  by  the  recognition  program  and 
there  was  too  little  remaining  spectral  information  foT  accurate  * 

identification  of  words  containing  /s/. 


•mots# i 
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This  problem  was  practically  solved  by  simply  broadening  the 

I 

16th  filter,  thereby  making  it  more  responsive  to  energy  above 
5KHz.  Formerly,  this  center  frequency  was  4325  Hz,  and_ii  was 
changed  to  a  broader  response  with  a  center  frequency  at  6000  Hz. 

The  above  change  was  made  prior  to  making  any  templates  for 
carrying  out  the  present  contract,  since  templates  made  with  the 
former  filter  complement  would  not  properly  match  the  new  input 
utterances.  As  a  result  of  this  change,  accuracy  has  improved 
overall,  even  in  many  utterances  not  containing  /s/. 

Inconsistencies  were  found  in  the  spectral  representations 
used  on  the  previous  contract.  These  seemed  to  occur  in  those 
regions  where  the  raw  input  amplitude  was  low,  though  still  of 
definite  significance.  The  problem  was  identified  as  the  appear¬ 
ance  of  erroneous  spectral  transitions  due  to  the  mixture  of  low 
amplitude  speech  and  spectrally  peaked  background  noise.  A  peaked 
source  of  noise  was  identified  in  the  room  where  data  was  usually 
recorded.  The  problem  was  practically  solved  by  1)  the  removal  of 
all  spectrally  peaked  sources  from  roans  used  in  recording,  and 
2)  raising  of  the  noise  threshold  above  the  level  at  which  remain¬ 
ing  background  and  channel  noise  would  have  significant  effect. 

Marked  improvement  in  performance  has  been  demonstrated  as  a 
result  of  the  above  modifications.  In  many  cases,  only  one  digit 
tenplate  is  needed  for  each  digit  to  achieve  good  recognition 
accuracy.  This  was  usually  not  possible  in  the  earlier  system. 
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1.2  Gisting  Scenario. 

The  gisting  scenario  simulates  the  task  of  gisting  in  an  air 
traffic  control  environment.  The  task  is  for  the  entry,  by  voice, 
of  air  traffic  control  information  by  an  operator  while  listening 
to  an  air  traffic  control  channel  through  a  headset.  The  vocabu¬ 
lary  and  scenario  comprise  a  representative  subset  of  possible  air 
traffic  control  tasks.  The  on-line  gisting  scenario  was  developed 
during  the  contract  and  was  successfully  demonstrated  to  representa¬ 
tives  of  the  contracting  agency. 

1.2.1.  The  Gisting  Task. 

Gisting  files  are  made  and  stored  in  conputer  memory  as  a 
record  of  air  traffic  control  activity.  Entries  are  made  into  gis¬ 
ting  files  by  an  operator  who  is  listening  to  an  air  traffic  control 
channel.  Gisting  data  could  be  entered  via  keyboard,  but  in  this 
case  the  operator  must  be  a  skilled  typist.  The  subject  contract 
was  for  the  creation  of  a  feasibility  model  for  the  use  of  voice 
input  to  the  gisting  files. 

1.2.2.  Simulation  of  the  Air  Traffic  Control  Environment. 

Simulated  gisting  tasks  were  devised,  and  specific  modes  and 

vocabulary  subsets  were  defined  to  accomodate  the  tasks.  Since 
the  resulting  system  was  experimental,  additional  test  modes  were 
incorporated  into  the  scenario.  Two  gisting  modes  were  incorporated 
corresponding  to  two  particular  types  of  file  data,  a)  a  descriptor 
word  (altitude,  tin®,  etc.),  followed  by  a  digit  group  up  to  2.5 
seconds  in  length,  and  2)  an  alphabet  character  (alpha,  bravo,  etc.) 
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followed  by  a  digit  group.  The  simulated  gisting  task  comprises 
operation  of  the  gisting  scenario  by  voice  using  control  words  and 
the  entry,  by  voice,  of  lists  of  phrases  according  to  the  above  two 
formats.  Except  for  initiation  and  operator  changes,  the  scenario 
is  controlled  entirely  by  voice  without  the  use  of  any  keyboard 
functions. 

1.2.3.  The  Gisting  Vocabulary. 

Vocabulary  words  used  in  the  scenario  are  shown  in  Table  I. 

The  digits  are  recognized  in  connected  mode  while  all  other  words 
have  to  be  spoken  singly.  Experimental  results  are  given  by  the 
use  of  Control  Group  1,  while  the  live  demonstration  used  Control 
Group  2.  In  addition  to  normal  gisting,  each  vocabulary,  except 
Control  Group  1,  can  be  accessed  for  testing  within  the  gisting 
scenario. 

1.2.4.  Gisting  Scenario  Block  Diagram. 

Figure  1  is  a  block  diagram  of  the  gisting  scenario.  "Initia¬ 
tion”  comprises  the  keyboard  entry  of  a  speaker  code  such  as  "HK1" 
for  exanple.  The  CRT  displays  the  question  "Do  I  know  you"  to  which 
a  response  of  "yes"  or  "No"  is  entered  by  voice.  If  "Yes"  is  spoken, 
the  operator  begins  to  make  normal  scenario  entries.  If  "No"  is 
spoken,  the  training  mode  is  accessed,  however,  this  segment  of  the 
scenario  was  incomplete  at  the  end  of  the  report  period. 

Normal  operation  of  the  scenario  is  begun  in  response  to  the 
question  "What  would  you  like  to  do?",  and  a  list  of  options.  Voice 
entry  of  "New"  opens  a  new  gisting  file,  while  entry  of  ’’Old" 
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accesses  an  existing  gisting  file.  In  the  latter  case,  the  file 
identity  is  entered  by  keyboard  in  the  form  of  month/day/hour  when 
the  file  was  created. 

The  test  mode  does  not  create  a  gisting  file,  but  permits  the 
operator  to  test  the  various  vocabularies.  The  scenario  instructs 
him  through  the  operation  of  accessing  each  vocabulary  and  speaking 
airy  of  the  words  in  the  selected  vocabulary.  Recognition  results 
are  displayed  an  the  CRT  for  visual  verification. 

The  "train",  "quit",  and  "top"  functions  are  obvious  from  the 
diagram,  although  as  mentioned  above,  the  automatic  training  func¬ 
tion  has  not  been  completed  at  the  time  of  this  report.  There  is 
an  "disable/enable"  operation  which  inhibits  the  response  and  entry 
of  irrelevant  speech  when  such  is  desired  by  the  operator.  Note 
that  this  function  appears  at  each  entry  node,  thus  permitting  the 
operator  to  suspend  voice  entry  at  any  point  of  the  scenario. 

The  scenario  signals  its  readiness  to  accept  entries  into  a 
gisting  file  by  the  instruction  "Enter  descriptor".  The  operator 
may  enter  gisting  data  or  access  additional  functions  either  prior 
to  file  entries  or  after  any  number  of  entries.  The  additional 
functions  are  "skip",  which  enters  a  blank  line  or  terminates  a 
partially  complete  line  of  gisting  data,  "show"  which  displays  the 
latest  form  of  the  gisting  file,  "quit"  which  exits  the  program, 
"alphabet"  which  switches  to  the  alphabet  descriptor  vocabulary, 
and  "backspace",  which  permits  reentry  of  the  previous  line.  The 
"disable/enable"  functions  are,  of  course  operative  at  this  node. 
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When  the  "alphabet"  command  is  spoken,  control  is  transferred 
to  the  "Enter  alphabet"  node  which  is  a  second  descriptor  node  opera¬ 
ting  on  the  same  gisting  file.  Return  to  the  normal  descriptor  node 
is  by  the  spoken  "Descriptor"  command.  In  other  respects,  the  "Enter 
alphabet"  node  is  identical  to  the  above  described  "Enter  descriptor" 
node. 

Gisting  file  entry  conprises  a  descriptor  input  followed  by  a 
digit  group,  or  an  alphabet  character  followed  by  a  digit  group. 

Note  that  there  are  identical  "Enter  digits"  nodes  depending  upon 
whether  a  normal  descriptor  or  an  alphabet  character  is  first  acces¬ 
sed.  In  addition  to  digit  group  entry,  additional  functions  are 
available  at  these  nodes.  "Skip"  causes  teimination  of  the  gisting 
data  line  with  no  digit  entry;  "Quit"  exits  the  program;  "Backspace" 
deletes  the  present  line  and  permits  its  reentry  by  succeeding  com¬ 
mands;  and,  of  course,  the  "disable/enable"  function. 

A  line  of  gisting  data  conprises  a  descriptor  [normal  or  alpha¬ 
bet)  followed  by  a  group  of  connected  digits.  Program  control  is 
strictly  by  voice.  Upon  recognition  of  a  descriptor,  control  goes 
to  the  "Enter  digits"  node.  Upon  recognition  of  a  digit  group,  con¬ 
trol  advances  to  the  next  line  via  an  "Advance  line"  block  for  entry 
of  the  next  line  into  the  gisting  file. 

This  section  has  described  the  gisting  scenario  as  developed 
during  the  report  period.  A  computer  printout  of  the  operational 
scenario  and  a  description  of  its  operation  are  given  in  Appendix  A. 
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2.0  EXPERIMENTAL  SET-UP  AND  PROCEDURES 

2.1  Test  Phrases. 

Recordings  were  made  on  magnetic  tape  of  utterances  to  be  used 
as  training  and  design  examples.  Training  exanples  consisted  of  two 
examples  of  each  vocabulary  subset  shown  in  Table  I  with  the  addi¬ 
tion  of  a  set  of  digit  pairs  for  use  in  training  the  machine  to 
recognize  connected  digits.  Table  II  is  a  list  of  the  training 
utterances  as  recorded  by  each  test  speaker. 

Test  phrases  were  randomly  chosen  by  drawing  chips  from  a  dish 
without  replacement.  Table  III  is  one  exanple.  In  the  case  of  con¬ 
trol  words,  there  are  five  exanples  spoken  of  each  control  word.  It 
will  be  noted  that  each  column  of  the  table  is  in  a  different  random 
order.  Normal  mode  phrases  are  made  up  of  pairs  of  utterances,  a 
descriptor  and  an  appropriate  group  of  connected  digits.  Alpha 
phrases  are  made  up  of  pairs  of  an  alpha  character  and  a  group  of 
three  connected  digits.  There  are  50  control  word  utterances,  144 
normal  mode  utterances  (72  phrases)  and  96  alpha  utterances  (48 
phrases)  for  a  total  of  290  test  utterances  or  170  phrases.  A  con¬ 
nected  digit  group  was  considered  to  be  a  single  utterance. 

Training  and  test  utterances  were  recorded  at  the  same  sitting 
and  under  the  same  conditions.  Conditions  of  the  recordings  are 
described  in  the  next  subsection. 

2.2  Data  Recording. 

All  recordings  were  made  in  a  quiet  area  and  particular  effort 
was  made  to  assure  a  reasonable  acoustic  environment,  free  of 
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TABLE  I 

GISTING  SCENARIO  VOCABULARIES 


Digits  Control  Group  1  Control  Group  2 


0 

yes 

old 

commands 

1 

no 

new 

words 

2 

entry 

test 

quit 

3 

file 

train 

enable 

4 

return 

top 

disable 

5 

stop 

backup 

alphabet 

6 

exit 

digits 

7 

normal 

8 

skip 

9 

go 

Descriptor  Words 

Alphabet  Words 

altitude 

number 

alpha 

golf 

airspeed 

temperature 

bravo 

hotel 

beaconcode 

right 

Charlie 

India 

time 

left 

delta 

Juliette 

aircraft 

forward 

echo 

kilo 

departure 

reverse 

foxtrot 

lima 

holding 

register 

release 

heading 

sector 

runway 

-  IS  - 


PLEASE  GIVE  YOUR  NAME. 

PLEASE  READ  THESE  COUMfS  FROM  TOP  TO  BOTTOM.  PAUSE  AFTER  EACH  WORD. 


0 

0 

38 

38 

yes 

yes 

old 

old 

1 

1 

31 

31 

no 

no 

new 

new 

2 

2 

18 

18 

entry 

entry 

test 

test 

3 

3 

28 

28 

file 

file 

train 

train 

4 

4 

48 

48 

return 

return 

top 

top 

S 

5 

68 

68 

stop 

stop 

backup 

backup 

6 

6 

88 

88 

exit 

exit 

digits 

digits 

7 

7 

41 

41 

normal 

normal 

commands 

comnands 

8 

8 

61 

61 

skip 

skip 

words 

words 

9 

9 

go 

go 

quit 

quit 

enable  enable 

disable  disable 


PLEASE  READ  THESE  COLUMNS  FROM  TOP  TO  BOTTOM.  PAUSE  AFTER  EAOi  WORD 


altitude 

airspeed 

beaconcode 

time 

aircraft 

departure 

holding 

release 

sector 

nunfcer 


rature 


right 


left 


forward 


reverse 

register 

heading 

runway 


altitude 

airspeed 

beaconcode 

time 

aircraft 

departure 

holding 

release 

sectOT 

number 

tenpe  rature 

right 

left 

forward 

reverse 

register 

heading 

runway 


alpha 

bravo 

Charlie 

delta 

echo 

foxtrot 

golf 

hotel 

India 

iuliette 

kilo 

lima 


alpha 

bravo 

charlie 

delta 

echo 

foxtrot 

golf 

hotel 

india 

juliette 

kilo 

lima 


Table  II 
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a.  comcL  Noras 


Normal 

No 

skip 

go 

go 

noraal 

exit 

skip 

no 

entry 

yes 

return 

stop 

exit 

return 

yes 

entry 

stop 

file 

file 

Skip 

File 

file 

go 

yes 

exit 

noraal 

yes 

no 

noraal 

axit 

no 

entry 

stop 

return 

skip 

go 

return 

stop 

sntry 

B.  NCSWM,  MODE  PHRASES 


Departure 

Release 

0425 

308 

airspeed 

tine 

180 

1807 

right 

holding 

84 

7000 

loft 

teaperature 

3S 

38 

aircraft 

register 

738 

34 

runway 

forward 

72 

40 

sector 

reverse 

658 

48 

heading 

beaconcode 

087 

658 

altitude 

mater 

4000 

588 

Sector 

Tmperature 

769 

83 

naway 

mariser 

29 

287 

heading 

register 

155 

74 

dapartura 

0436 

tint 

1706 

aircraft 

tawtne 

941 

76 

right 

xunfara 

17 

42 

airspeed 

release 

425 

637 

beaconcode 

holding 

S9S 

700 

left 

altitude 

04 

400 

Departure 

Heading 

2024 

359 

teaperature 

right 

57 

SI 

altitude 

release 

3000 

121 

holding 

left 

6000 

80 

airspeed 

forward 

275 

49 

reverse 

register 

88 

44 

beaconcode 

89 

985 

reuber 

aircraft 

955 

S21 

tine 

sector 

0340 

731 

C.  ALPHA  PHRASES 


Foxtrot  t.imi 
700  660 

delta  alpha 

SS2  93S 

india  charlio 

990  691 

kilo  echo 

341  961 

hotel  juliette 

408  146 

golf  bravo 

823  197 


Foxtrot 

882 

juliette 

340 

bravo 

147 

lina 

968 

kilo 

962 

Charlie 

521 


Hotel 

633 

India 

559 

echo 

957 

delta 

654 


golf 

670 


Lima 

Juliette 

774 

978 

hotel 

Charlie 

230 

306 

golf 

echo 

025 

363 

foxtrot 

delta 

105 

210 

alpha 

kilo 

966 

473 

bravo 

India 

941 

183 

Reverse 

22 

tine 

2139 

release 

971 

register 

92 


heeding 

163 

sector 

432 


departure 

0819 

left 

75 

forward 

63 


Delta 

365 

Charlie 

159 

India 

269 

line 

791 

hotel 

330 

kilo 

703 


Noraal 

return 

file 

exit 

stop 

skip 

entry 

no 

yes 

go 


Holding 

8000 

aircraft 

154 

beaconcode 

759 

temperature 

47 

naeiay 

46 

right 

42 

airspeed 

250 

number 

644 

altitude 

2000 


Golf 

394 

bravo 

217 

foxtrot 

495 

echo 

611 

juliette 

545 

alpha 

594 


Table  nr 


Test  Utterances 
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excessive  echoes  and  reverbration.  Some  recordings  were  made  in  our 
laboratory,  some  in  private  homes,  and  some  in  suitable  rooms  of 
public  buildings. 

Recordings  were  made  at  3  3/4  inches  per  second  an  one  channel 
of  a  TEAC  2300S  stereo  recorder,  using  an  ElectTO-Voice  Model  651 
portable  microphone  system.  The  operator  briefed  and  coached  each 
test  subject,  and  monitored  his  recording  by  means  of  a  headset. 

If  a  speaker  made  an  error  in  reading,  or  mispronounced  a  word,  the 
operator  stopped  the  tape  recorder  and  re-recorded  the  erroneous 
utterance. 

Speakers  fit  roughly  into  three  categories: 

1)  Those  experienced  with  Speech  Recognition. 

2)  Those  experienced  with  microphone  use. 

3)  The  general  public. 

There  were  men  and  women  in  all  three  categories.  The  best  results 
were  obtained  for  categories  1)  and  2) ,  which  will  be  seen  from  the 
results  of  Section  3. 

2.3  Test  Procedure. 

The  data  base  was  generated  as  a  set  of  tenplate  files,  such 
that  each  speaker  had  one  template  file  for  each  vocabulary  subset 
(command  words,  descriptors,  alphabet  characters,  and  connected 
digits) .  These  were  used  in  conjunction  with  a  word  definition  file 
for  each  vocabulary  subset,  to  obtain  results  indicating  the  accu¬ 
racy  of  the  recognition  algorithms. 

2.3.1.  Data  Base  Generation. 

The  data  base  was  generated  by  hand  with  the  aid  of  a  GT-42 
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Interactive  Graphics  Terminal.  The  audio  tape  recorder  output  was 
fed  to  the  recognizer  filter  bank  and  its  associated  parameter  cir¬ 
cuits.  The  operator  entered  one  utterance  at  a  time  (word  or  con¬ 
nected  digit  group)  from  the  tape  into  the  conputer.  Hie  result 
was  displayed  as  an  amplitude  plot  and  a  plot  of  correlation  against 
the  spectrun  at  peak  amplitude.  Templates  were  made  by  observing 
the  syllabic  structure  of  the  utterance,  then  entering  selected  sam¬ 
ples;  selected  sanples  were  taken  by  means  of  the  time-warp  computer 
algorithm  corresponding  to  the  first  and  second  half  of  each  syl¬ 
lable.  In  seme  cases,  there  were  too  few  selected  sanples  to  allow 
for  breaking  the  syllable  up  into  two  parts ,  and  in  that  case  only  a 
single  tenplate  was  made  for  the  entire  syllable.  There  were,  of 
course,  corresponding  word  files  in  which  word  definitions  were 
entered  for  every  valid  combination  of  tenplate  sequences. 

In  the  case  of  connected  digits,  a  number  of  alternative  temp¬ 
lates  were  usually  required  for  some  of  the  digits  to  achieve  the 
desired  accuracy,  but  for  the  remaining  alphabet  subsets ,  very  few 
alternative  templates  were  used  at  all.  For  the  most  part,  the 
tenplate  making  task  for  all  alphabet  subsets  of  each  person  required 
on  the  average  about  an  hour's  time. 

2.3.2.  Testing. 

Test  utterances  were  transferred  from  audio  tape  to  the  RP04 
disk  so  that  automatic  testing  could  be  done.  Each  utterance  was 
carefully  stored  in  its  own  file  under  control  of  the  operator. 

Once  the  utterances  were  on  the  disk,  an  entire  vocabulary  subset 
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could  be  run  automatically,  thereby  producing  the  raw  test  results 
which  are  analyzed  in  Section  3. 

The  operator  brought  in  the  proper  tenplate  file  and  word  file 
for  each  test,  corresponding  to  speaker  and  vocabulary  subset,  then 
ran  the  automatic  recognition  program.  The  recognition  program  com¬ 
pared  each  input  utterance  against  each  template  file  entry  to  obtain 
a  series  of  syllable  responses  and  their  corresponding  scores.  These 
were  then  applied  to  the  word  criteria  portion  of  the  recognition 
program  for  further  refinement  and  ultimately  a  decision  as  to  the 
identity  of  the  input  utterance. 


-  20  - 


3.0  RESULTS 

3.1  Performance  Data. 

The  results  are  presented  in  two  sections.  The  first  section 
presents  the  raw  data  as  printed  out  by  the  speech  recognition  sys¬ 
tem.  The  second  section  is  an  analysis  of  the  results  by  word  and 
speaker  categories. 

Fifty  speakers  were  recorded  as  described  in  the  previous  sec¬ 
tion,  the  results  are  tabulated  in  Table  2-1.  The  table  shows 
results  for  46  speakers,  the  other  four  recordings  were  unacceptable 
because  over  254  of  the  utterances  recorded  contained  pauses .  The 
pauses  make  the  results  more  indicative  of  a  discreet  word  recogni- 

i 

tion  system  and  therefore  were  not  used.  The  results  in  Table  3-1 
are  for  the  connected  digits.  There  were  48  utterances  of  3  digits 
each  for  a  total  of  144  digits  for  each  of  the  46  speakers.  The 
overall  correct  recognition  rate  including  errors  from  all  sources 
was  97.494.  The  results  for  the  upper  90  percentile  (4  worst  speak¬ 
ers  removed)  was  984. 

The  full  vocabulary  of  command  words,  Descriptor  words  and  the 
Alpha  words  was  tested  for  25  speakers.  These  were  selected  as  good 
talkers  based  on  their  performance  in  the  connected  digit  tests. 

The  results  for  the  three  vocabulary  subsets  are  shown  in  Table  3-2. 
The  data  indicates  an  overall  correct  recognition  of  994  for  the 
whole  vocabulary  of  40  words .  Broken  down  by  subset  the  results 
show  that  the  Alpha  and  Descriptor  vocabularies  were  recognized  with 
99.14  and  the  Control  vocabulary  with  98.94  correct  recognition. 
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but  the  training  set  was  obtained  in  1979.  A  summary  of  the  compari 
son  is  shown  in  Table  3-3. 

3.1.2.  Final  Demonstration. 

The  results  of  the  final  demonstration  were  obtained  in  two 


Speaker  # 

Speaker 

Nana 

Alpha 

Canmand 

Descriptors 

Average  t 
of  correct 
recognition 

1 

AD 

0 

1 

1 

98.8 

2 

AK 

0 

1 

2 

98.2 

3 

CM 

1 

2 

1 

97.6 

4 

RC 

0 

0 

1 

99.4 

S 

HK 

0 

0 

0 

100.0 

6 

LF 

1 

1 

1 

98.2 

7 

JX 

0 

0 

0 

100.0 

8 

FX 

0 

1 

1 

98.8 

9 

MB 

0 

0 

0 

100.0 

10 

AS 

2 

0 

1 

98.2 

11 

TS 

0 

0 

1 

99.4 

12 

WO 

0 

0 

0 

100.0 

13 

ST 

1 

1 

0 

98.8 

14 

DR 

0 

0 

1 

99.4 

IS 

1M 

0 

1 

1 

98.8 

16 

RP 

0 

0 

1 

99.4 

17 

BX 

0 

2 

0 

98.8 

18 

OS 

1 

0 

0 

99.4 

19 

DO 

0 

1 

0 

99.4 

20 

FH 

1 

1 

0 

98.8 

21 

JV 

0 

0 

2 

98.8 

22 

TO 

2 

0 

1 

98.2 

23 

LC 

,  0 

2 

1 

98.2 

24 

m 

0 

0 

1 

99.4 

_ 25  — 

—Bfi _ 

2  - 

_ 0 

_ Q _ 

98-B _ 

Table  3-2:  Results  For  TheThree  Subsets  Of  The  Vocabulary. 
Each  error  represents  1.4t  in  the  Descriptor  subset  and  it 
in  the  Alpha  md  Coemand  subsets.  The  average  t  correct 
recognition  is  for  the  vocabulary  as  a  whole  averaged  over 
the  three  subsets. 
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[  Speaker 

\  Correct  Recognition  { 

» 

Nam 

1978  SRS 

1979  SRS 

1 

MB 

96.9 

98.1 

2 

LF 

98.8 

98.8 

3 

AD 

98.8 

100.0 

4 

HJC 

9S.6 

98.1 

S 

DD 

97. S 

97. S 

6 

OC 

9S.6 

97;s 

Table  3-3:  Comparison  of  1978  and  1979  SRS. 
(Speech  Recognition  Systems}.  ttesults  tr» 
based  an  test  data  recorded  in  1978. 


parts,  a)  spot  check  of  test  results,  and  b)  live  testing. 

For  the  first  part  the  RADC  representative  chose  two  speakers 
at  random  from  the  file.  The  test  results  were  verified  by  a  test 
run  using  the  procedure  outlined  in  Section  2.3.2. 

The  live  test  consisted  of  a  demonstration  of  the  Gisting 
operation.  The  RADC  representative  trained  the  system  by  recording 
one  set  of  digits  spoken  in  a  discreet  manner.  He  also  recorded  a 
set  of  ten  double  digits  sequences.  Based  on  these  recordings  a  set 
of  reference  patterns  was  obtained  containing  one  sample  for  each 
digit  except  the  digits  2,  3  and  8  which  had  two  representative 
patterns  each.  The  test  consisted  of  reading  a  list  of  fifty  triple 
digit  sequences  with  a  performance  accuracy  of  93.31.  Most  of  the 
errors  were  in  the  digits  2  and  8.  The  digit  8  accounted  for  708  of 
the  errors  all  of  them  omission  type  errors,  indicating  that  it  had 
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an  insufficient  number  of  representative  tenplates.  Two  of  PTC's 
personnel,  AD  and  HK,  used  the  system  to  enter  25  data  lines  into  a 
table  using  voice  g is ting  only.  AD  was  able  to  enter  the  25  data 
items  using  27  statements,  HK  did  it  using  26  statements. 

з. 2  Analysis  of  Performance  Data. 

The  results  indicate  that  an  overall  accuracy  of  97.51  for 
connected  digits  was  achieved.  In  order  to  analyze  these  results 
the  speakers  were  categorized  into  groups  according  to  sex  and 
experienced  or  nanexperienced  talkers. 

The  confusion  matrix  for  the  digits  for  the  whole  group  of  46 
speakers  is  shown  in  Table  3-4.  The  digit  "two"  accounts  for  the 
largets  number  of  amission  errors  (7.21).  This  is  due  to  the  fact 
that  a  high  acceptance  threshold  was  set  for  this  word  in  order  to 
avoid  extraneous  recognitions .  The  extraneous  recognitions  for 
the  digit  "two"  were  only  1.8%  indicating  that  there  is  room  for  a 
better  setting  of  the  threshold.  For  the  digit  "eight"  this  thresh¬ 
old  seems  optimal  as  the  balance  between  omission  and  extraneous 
errors  is  5.4%  to  6.61.  The  major  source  of  errors  was  due  to  the 
confusion  of  the  digits  "one"  and  "nine",  between  them  they  accounted 
for  13.2%  of  the  errors.  The  reason  for  the  errors  is  due  to  the 
coarticulation  problem  that  occurs  when  "one"  is  preceded  by  digits 
ending  with  a  nasal,  and  "nine"  is  preceded  by  digits  ending  with  a 

и.  It  is  interesting  to  note  that  the  traditional  "five"  •'nine" 
confusion  was  eliminated.  This  type  of  error,  namely  for  the  digits 
"one"  -  "nine",  is  worst  in  the  case  of  female  speakers.  Table  3-5 
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RECOGNIZED 


0 

1 

2 

3 

4 

S 

6 

7 

8 

9 

9 

0 

1.2 

3.0 

0.6 

3.6 

3.0 

1.2 

1 

2.4 

6.0 

3.6 

2 

1.2 

2.4 

0.6 

7.2 

3 

6.6 

4.8 

4 

1.2 

3.6 

3.0 

5 

l.S 

0.6 

6 

0.6 

0.6 

1.8 

1.2 

7 

2.4 

3.0 

0.6 

8 

2.4 

2.4 

5.4 

9 

7.2 

0.6 

1.2 

E 

1.2 

1.8 

0.6 

2.4 

0.6 

6.6 

1.2 

Tabic  3-4:  Confualan  Matrix  far  Coroacted  Digits  for 
46  Male  and  F—U  So— Kara.  The  errors  in  in  percent 
t*j ,  coanuE53T'FTS«»aliting  the  nuafcer  of  errors  in  each 
elamnt  by  the  total  nuafcer  of  errors.  The  F.  in  the 
"STOKBi"  coltan  naana  an  extraneous  word  was  recognized 
even  though  it  was  not  spoken. 


shows  the  confusion  matrix  for  the  9  female  speakers.  The  overall 
performance  for  female  speakers  was  96.71  versus  97.8%  for  male 
speakers  however,  the  distribution  of  errors  was  markedly  different. 
Table  3-6  shows  the  confusion  matrix  for  37  male  speakers.  The  high 
est  nuifcer  of  errors  is  due  to  omission  errors  by  the  digit  "two" 
(9.84)  and  "three"  -  "two"  confusion  (7.34).  For  female  speakers 


some  of  the  errors  are  also  due  to  "two"  -  "three"  confusion  (8.94). 
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However,  the  largest  number  of  errors  was  due  to  the  "nine"  -  "one" 
(15.6%)  confusion  and  the  "one"  -  "nine"  (13.31)  confusion.  The  fact 
that  these  two  digits  contributed  to  29%  of  the  errors  indicates  a 
deficiency  in  the  recognition  of  nasals. 

The  results  for  the  Alpha,  Descriptor  and  Control  vocabularies 
indicate  correct  recognition  of  99%.  A  confusion  matrix  is  not  needed 
since  96%  of  all  errors  are  errors  of  omission. 


RECOGNIZED 


n 

0 

i 

2 

3 

4 

5 

6 

7 

8 

9 

? 

a 

D 

2.2 

D 

■ 

■ 

a 

■ 

D 

■ 

■ 

13.3 

■ 

2 

2.2 

_ 

■ 

■ 

■ 

■ 

3 

D 

■ 

■ 

■ 

■ 

D 

a 

D 

n 

5 

2.2 

■ 

6 

2.2 

■ 

2.2 

■ 

2.2 

7 

n 

■ 

■ 

8 

2.2 

9 

15.6 

E 

D 

2.2 

TabU  5-5:  Confualor  Matrix  for  Connected  Digits  for 
9  r—la  So— Kara.  me  errors  art  in  percent  ftl. 
coeputed  by  normalizing  the  mater  of  errors  in  each 
•leant  by  the  total  mater  of  errors.  The  E  in  the 
"SKEW  colun  aaans  an  extrsfous  word  was  recog¬ 
nized  even  though  it  was  not  spoken. 
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RECOGNIZED 


0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

7 

0 

4.1 

3.3 

4.1 

1.6 

1 

1.6 

3.3 

4.9 

2 

9.8 

3 

7.3 

4.1 

4 

4.9 

2.1 

5 

1.6 

0.8 

6 

0.8 

1.6 

0.8 

7 

3.3 

2.4 

0.8 

3 

3.3 

3.3 

6.S 

9 

4.1 

0.8 

1.6 

E 

1.6 

2.4 

1.6 

3.3 

1.6 

7.3 

0.8 

Table  3-6:  Confusion  Matrix  for  Connected  Digits  for 
37  Male  Spaakars.  The  errors  are  in  Btrcwt  (3) .  ~ 

coqputtd  By  nomadizing  the  mater  of  errors  in  each 
el a—nt  by  the  total  mater  of  errors.  The  E  in  the 
"SPOKEN"  colian  mens  an  extraneous  word  was  recog¬ 
nised  even  though  it  was  not  spoken. 


The  group  of  experienced  talkers  chosen  from  police,  fire 
department  personnel  and  our  own  laboratory  had  the  best  performance 
record.  The  25  speakers  shown  in  this  category  had  an  overall  cor¬ 
rect  recognition  of  98.41  for  connected  digits,  aid  994  for  the  rest 
of  the  vocabulary. 


4.0  CONCLUSIONS 

A  voice  activated  interactive  cormand  and  control  system  is 
constructed  and  demonstrated.  The  system  is  used  as  a  test  bed  to 
evaluate  various  voice  activated  cormand  and  control  tasks  as  well 
as  performance  levels  for  several  groups  of  speakers.  The  results 
are  as  follows: 

1.  Connected  digits  are  recognized  with  an  accuracy  of  98.4% 
for  talkers  with  experience  in  speaking  situations. 

2.  Command  and  control  vocabularies  are  recognized  with  an 
accuracy  of  99%. 

3.  The  performance  for  female  speakers  is  within  1%  of  the 
performance  for  male  speakers  although  the  system  is  optimized  for 
male  speakers. 

4.  For  experienced  speakers  performance  results  on  connected 
digits  are  practically  the  same  on  old  and  new  test  data  using  one 
set  of  templates. 

It  is  not  obvious  whether  this  performance  level  is  adequate 
for  many  applications  but  it  is  felt  that  the  high  degree  of  inter¬ 
action  naturalness  and  the  fast  throughput  achieved  makes  this  sys¬ 
tem  a  good  starting  point  and  a  test  bed  foT  voice  cormand  and 
control  applications. 
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APPENDIX  A 

Gisting  Scenario  Operation 

This  appendix  describes  the  gisting  scenario  operation  with 
reference  to  a  printout  of  the  responses  that  would  normally  appear 
an  the  CRT  terminal.  A  test  run  was  made  through  all  of  the  gisting 
nodes,  and  the  resulting  printout  is  given  on  pages  AS  through  A12. 

The  gisting  program  was  started  by  means  of  keyboard  entries, 
including  initialization  to  speaker  HK1.  Printed  responses  to  the 
initialization  instructions  are  shown  at  the  top  of  page  AS.  After 
initialization  the  program  asks,  "Do  I  know  you?"  to  which  the  opera¬ 
tor  answers  "yes"  by  voice  input.  The  program  then  asks  "What  would 
you  like  to  do?"  and  lists  the  options.  The  operator  answers  "new", 
indicating  that  a  new  gisting  file  is  to  be  created.  The  normal 
gisting  mode  is  entered  automatically;  the  program  asks  for  a  descrip¬ 
tor;  and  the  operator  responds  by  saying,  in  this  case,  "altitude" 
into  the  microphone.  A  descriptor  must  be  followed  by  a  digit  group 
in  this  mode,  and  the  operator  enters  the  digit  grovp  "300",  in 
connected  speech.  The  program  displays  the  gisting  file  entry 
"Altitude  *  3000"  and  asks  for  the  next  descriptor.  Instead  of  a 
descriptor,  the  operator  says  "quit",  causing  the  program  to  return 
to  the  previous  menu  and  ask  again  "What  would  you  like  to  do?"  The 
operator  says  "disable",  but  the  program  does  not  recognize  the 
utterance.  It  gives  error  messages  (bottom  of  page  A5)  and  again 
prints  the  main  menu  at  the  top  of  pege  A6. 
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The  operator  says  "disable'*  a  second  tine.  Hie  program  recog¬ 
nizes  it  and  inhibits  further  input.  When  he  is  read/  to  resume, 
the  operator  speaks  "enable",  the  program  is  enabled,  and  the  pre¬ 
vious  menu  is  again  displayed.  The  operator  next  says  "old"  indi¬ 
cating  that  he  wants  to  append  or  edit  an  existing  file.  Note  that 
the  program  asks  for  a  typed  entry  here.  Hie  operator  types  "9,  25, 
10"  which  are  the  month,  day,  and  hour  that  the  above  new  file  was 
created.  The  program  asks  for  a  descriptor,  but  instead  the  opera¬ 
tor  says  "show"  to  display  the  contents  of  the  accessed  file.  Note 
here  that  "altitude  3000"  is  printed  corresponding  to  the  known  con¬ 
tent  of  the  previous  file.  The  program  asks  for  a  descriptor  and 
the  operator  says  "aircraft".  The  operator  then  says  the  connected 
digit  group  "279". 

Hie  line  "Aircraft  «  279"  is  displayed  at  the  top  of  page  A7 
then  the  program  asks  for  the  next  descriptor.  The  operator  says 
"alphabet",  a  command  word  that  switches  to  the  alternate  gisting 
mode  (alphabet  character  followed  by  digit  group) .  The  conputer 
responds  with  "enter  alphabet".  The  operator  says  "alpha"  which  is 
recognized,  then  "123",  after  which  the  recognized  gisting  line 
"alpha  -  123"  is  displayed.  The  command  word  "show"  is  then  spoken 
and  the  program  displays  the  contents  of  the  gisting  file.  The 
first  entry  was  mada  ipon  creation  of  the  file,  the  second  and  third 
were  added  in  the  "old"  file  mode,  and  the  third  was  from  the  alter¬ 
native  gisting  mode  using  alphabet  characters  as  descriptors.  The 
operator  next  says  ’tydt"  thereby  returning  to  the  gisting  menu. 
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then  says  "test"  to  enable  the  test  menu  which  indicates  the  vocabu¬ 
laries  that  can  be  tested. 

At  the  top  of  page  A8,  the  program  asks  "which  vocabulary  do 
you  want  to  try?"  and  lists  the  vocabularies  that  can  be  accessed. 

In  this  case  the  operator  says  "digits",  thereby  accessing  the  con¬ 
nected  digits  mode.  The  operator  speaks  six  digit  groups,  illu¬ 
strating  the  flexibility  of  the  connected  digit  recognizer  to  recog¬ 
nize  groups  up  to  6  or  more  digits  in  length.  Hie  operator  then 
says  "backup",  to  return  to  the  vocabulary  test  menu.  The  program 
asks  "Do  you  really  want  to  backup?",  to  which  the  operator  answers 
"yes".  At  first  the  program  does  not  understand  what  was  said,  but 
responds  when  "yes"  is  repeated  a  second  time. 

The  vocabulary  test  menu  appears  again  at  the  top  of  page  A9, 
and  the  "alphabet"  vocabulary  is  tested.  The  operator  speaks 
"alpha",  ,fbravo",  and  "charlie"  and  the  program  responds  with  the 
correct  recognition  each  time.  He  then  says  "backup"  and  returns 
to  test  the  vocabulary  of  "words"  or  descriptors. 

On  page  A10,  the  word  vocabulary  is  tested.  The  operator  at 
first  disables  the  gister  by  voice,  then  resumes  by  speaking  "enable". 
The  spoken  word  input  "altitude"  is  not  recognized  at  first,  but  is 
recognized  when  spoken  a  second  time.  Vocabulary  words  "altitude", 
"airspeed",  "beaccncode"  and  "time"  are  spoken  and  the  program 
responds  correctly.  Similarly,  at  the  bottom  of  page  A10  the  com¬ 
mand  word  vocabulary  is  accessed  for  testing. 

The  command  word  vocabulary  test  is  done  on  page  All.  All 
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words  in  command  group  No*  2  are  spoken  and  correctly  recognized. 
Certain  cctmand  words  are  needed  for  program  control,  and  when  one 
of  these  is  recognized  the  program  asks:  "Do  you  really  want  to 

_ "  where  "train",  "backup"  or  "quit"  are  inserted  in  the  blank. 

This  test  continues  at  the  top  of  page  A12. 

At  the  end  of  the  command  word  test  at  the  top  of  page  A12, 
the  operator  says  "backup"  once  to  get  to  the  vocabulary  test  menu, 
then  again  to  get  bade  to  the  main  menu.  At  this  time  he  says  "quit", 
which  must  be  verified  by  saying  "yes".  At  this  time,  the  program 
exits,  and  the  exercise  of  the  gisting  scenario  is  conpleted. 


RUN  QISTER 

CHANGE  NPRrNSH  ?  (Y*..)  > 

CHANGE  BOUNDARY/THRESH/INTRO  ADJUSTMENT?  > 
DEBUG  CORREL?  > 

HELLO*  THIS  IS  THE  P.T.C*  GISTER. 

PLEASE  TYPE  YOUR  (3)  INITIALS* 

HK1 


DO  I  KNOW  YOU? 


WHAT  WOULD  YOU  LIKE  TO  DO? 

NEW  (CREATE  NEW  FILE) 

OLD  (CONTINUE  OLD  FILE) 

TEST  (TEST  TEMPLATES) 

TRAIN 

QUIT 

TOP 


you  said::::  new 
186 

2  >  ENTER  DESCRIPTOR  (AIRSPEED* AIRCRAFT *ETC> 

COMMAND  ( QUIT  »  SKIP  *  BACKUP  »  SHOW » ALPHABET ) 


you  said::::  altitud 
180 

2  >  ALTITUD  *> 


you  said::::  30  0 

167  179  185 

2  >  ALTITUD  ■  3  0  0 

3  >  ENTER  DESCRIPTOR  (AIRSPEED* AIRCRAFT *ETC) 

COMMAND  ( QUIT  *  SKIP  *  BACKUP*  SHOW  *  ALPHABET ) 


you  said::::  quit 
186 


WHAT  WOULD  YOU  LIKE  TO  DO? 

NEW  (CREATE  NEW  FILE) 

OLD  (CONTINUE  OLD  FILE) 

TEST  (TEST  TEMPLATES) 

TRAIN 
QUIT 


A-6 


SORRY . I  DIDN'T  UNDERSTAND  WHAT  YOU  SAID. 


AUDMATt COULD  YOU  REPEAT  THAT  PLEASE 


WHAT  WOULD  YOU  LIKE  TO  DO? 

NEW  (CREATE  NEW  FILE) 

OLD  (CONTINUE  OLD  FILE) 

TEST  (TEST  TEMPLATES) 

TRAIN 

QUIT 

TOP 


you  said::::  disable 
182 

THE  AUDOMAT  IS  NOW  DISABLED 
TO  CONTINUE.  PLEASE  SAY  "ENABLE* 
THE  GISTER  IS  ENABLED 


WHAT  WOULD  YOU  LIKE  TO  DO? 

NEW  (CREATE  NEW  FILE) 

OLD  (CONTINUE  OLD  FILE) 

TEST  (TEST  TEMPLATES) 

TRAIN 

QUIT 

TOP 


you  said::::  old 

174 

type:  ENTER  MONTH. DAY .HOUR (0-24) .DESIRED  FILE  9.25.10 
3  >  ENTER  DESCRIPTOR  (AIRSPEED. AIRCRAFT. ETC) 

COMMAND  ( QUIT » SKIP » BACKUP . SHOW . ALPHABET ) 


you  said::::  show 
182 

2:  ALTITUD  3  0 

3  >  ENTER  DESCRIPTOR 
COMMAND 


0 

(AIRSPEED » AIRCRAFT .ETC) 

( QU I T » SK I P . BACKUP . SHO W . ALPHABET  > 


you  said::::  aircraf 

183 

3  >  AIRCRAF  -> 


you  said::::  279 

172  182  180 

3  >  AIRCRAF  =2  7  9 

4  >  ENTER  DESCRIPTOR  (AIRSPEED* AIRCRAFT* ETC) 

COMMAND  ( QUIT » SKIP  r  BACKUP * SHOW  *  ALPHABET  > 


you  said::::  alphabt 
180 

4  >  ENTER  ALPHABET  (ALPHA* BRAVO* CHARLIE* ♦ . ♦ > 

COMMAND  ‘  (QUIT* SKIP*  BACKUP  * SHOW » DESCRIPTOR ) 


you  said::::  alpha 

184 

4  >  ALPHA  => 


you  said::::  123 

188  174  180 

4  >  ALPHA  =1  2  3 

5  >  ENTER  ALPHABET  ( ALPHA* BRAVO *CHARLIE» ...  ) 

COMMAND  <  QUI T  »  SKIP  *  BACKUP  *  SHOW » DESCRIPTOR ) 


you  said::::  show 

190 

2:  ALTITUD  300 
3:  AIRCRAF  279 
4:  ALPHA  123 

5  >  ENTER  ALPHABET  (ALPHA*BRAVO»CHARLIE* . . . > 

COMMAND  ( QUI T  *  SKIP  *  BACKUP » SHOW  *  DESCRIPTOR  > 


you  said::::  quit 
186 


WHAT  WOULD  YOU  LIKE  TO  DOT 

NEW  (CREATE  NEW  FILE) 

OLD  (CONTINUE  OLD  FILE) 

TEST  (TEST  TEMPLATES) 

TRAIN 

QUIT 

TOP 


you  said::::  test 

185 


WHICH  VOCABULARY  DO  YOU  WANT  TO  TRY? 

DIGITS 

ALPHABET 

WORDS 

COMMANDS 


,  you  said:::?  digits 

I  174 


PLEASE  SAY  ANY  SUBSET  OF  THE  DIGITS 

YOU  MAY  CONTINUE  SPEAKING  AFTER  I  HAVE  RESPONDED 

IF  I  MAKE  TOO  MANY  MISTAKES  »  PLEASE  SAY  ’TRAIN'  AND  RE-TRAIN  ME. 
OF  COURSE  YOU  CAN  SAY  ’BACKUP’  AND  ASK  FOR  ANOTHER  VOCABULARY 


you  said::::  i 

187 


YOU 

said::::  i 

2 

186 

186 

YOU 

said::::  i 

2 

3 

186 

174 

171 

YOU 

said::::  i 

2 

3 

4 

187 

174 

177 

177 

YOU 

said::::  i 

2 

3 

4 

5 

185 

177 

174 

180 

185 

YOU 

said::::  i 

2 

3 

4 

5 

187 

174 

175 

175 

185 

YOU 

said::::  backup 

1B8 

audmat:  do  you  really  want  to  backup  ? 


PLEASE  REPEAT 
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WHICH  VOCABULARY  DO  YOU  WANT  TO  TRY? 

DIGITS 

ALPHABET 

WORDS 

COMMANDS 


you  said::::  alphabt 
186 


PLEASE  SAY  ANY  SUBSET  OF  THE  ALPHABT 

YOU  MAY  CONTINUE  SPEAKING  AFTER  I  HAVE  RESPONDED 

IF  I  MAKE  TOO  MANY  MISTAKES  t  PLEASE  SAY  'TRAIN*  AND  RE-TRAIN  ME* 
OF  COURSE  YOU  CAN  SAY  'BACKUP*  AND  ASK  FOR  ANOTHER  VOCABULARY 


you  said::::  alpha 
188 


you  said::::  bravo 

190 


you  said::::  charlie 

190 


you  said::::  backup 
180 


audmat:  do  you  really  want  to  backup  ? 


WHICH  VOCABULARY  DO  YOU  WANT  TO  TRY? 

DIGITS 

ALPHABET 

WORDS 

COMMANDS 


you  said::::  words 

184 
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PLEASE  SAY  ANY  SUBSET  OF  THE  WORDS 

YOU  HAY  CONTINUE  SPEAKING  AFTER  I  HAVE  RESPONDED 

IF  I  MAKE  TOO  MANY  MISTAKES  i  PLEASE  SAY  'TRAIN*  AND  RE-TRAIN  ME. 
OF  COURSE  YOU  CAN  SAY  'BACKUP'  AND  ASK  FOR  ANOTHER  VOCABULARY 


you  said::::  disable 

184 

THE  AUDOMAT  IS  NOW  DISABLED 
TO  CONTINUE*  PLEASE  SAY  'ENABLE' 
THE  GISTER  IS  ENABLED 


SORRY* I  DIDN'T  UNDERSTAND  WHAT  YOU  SAID. 


you  said::::  altitud 

179 


you  said::::  airspee 

167  - 


you  said::::  beaconc 

179 


you  said::::  time 
182 


you  said::::  backup 

183 


audmat:  do  you  really  want  to  backup  t 


PLEASE  REPEAT 


WHICH  VOCABULARY  DO  YOU  WANT  TO  TRY? 

DIGITS 

ALPHABET 

WORDS 

COMMANDS 


you  said::::  commans 

178 
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PLEASE  SAY  ANY  SUBSET  OF  THE  COMMANS 

YOU  NAY  CONTINUE  SPEAKING  AFTER  I  HAVE  RESPONDED 

IF  I  MAKE  TOO  MANY  MISTAKES  t  PLEASE  SAY  'TRAIN*  AND  RE-TRAIN  ME. 
OF  COURSE  YOU  CAN  SAY  'BACKUP*  AND  ASK  FOR  ANOTHER  VOCABULARY 


YOU  SAID  **  ti  OLD 
177 


you  said::::  new 
182 


you  said::::  test 
186 


you  said::::  train 

191 


audmat:  do  you  really  want  to  train  ? 


you  said::::  top 

190 


you  said::::  backup 

187 


audmat:  do  you  really  want  to  backup  ? 


you  said::::  digits 
176 


you  said::::  commans 

184  • 


you  said::::  words 
186 


you  said::::  quit 
188 


AUDMATJ  DO  YOU  REALLY  WANT  TO  QUIT 


? 


YOU  SAIDltn  DISABLE 
182 

THE  AUDOHAT  IS  NOW  DISABLED 
TO  CONTINUE »  PLEASE  SAY  •ENABLE* 
THE  6ISTER  IS  ENABLED 


you  said::::  alphabt 

184 


you  said::::  backup 

184 


audmat:  do  you  really  want  to  backup  ? 


WHICH  VOCABULARY  DO  YOU  WANT  TO  TRY? 

DIGITS 

ALPHABET 

WORDS 

COMMANDS 


you  said::::  backup 

177 


WHAT  WOULD  YOU  LIKE  TO  DO? 

NEW  (CREATE  NEW  FILE) 

OLD  (CONTINUE  OLD  FILE) 

TEST  (TEST  TEMPLATES) 

TRAIN 

QUIT 

TOP 


you  said::::  quit 
188 


ARE  YOU  SURE  YOU  WANT  TO  QUIT? 


NICE  CHATTING  WITH  YOU  HK1 
BYE 

TTO  —  STOP 


